gh-144370: Disallow usage of control characters in header names, values and statuses in Lib/wsgiref/handlers.py for security#144371
Conversation
|
Important note: This PR needs backport to 3.10, 3.11, 3.12, 3.13 and 3.14 please, thanks! 👍 (I'll probably add these tomorrow if there are no concerns.) |
|
@gpshead or @StanFromIreland / @ZeroIntensity please review, thanks! |
|
What is different from #144118? |
|
I would suggest that you directly comment on Seth's PR instead. |
|
@picnixz I guess that this was a misunderstanding, this PR is about a completely different thing that is changes. This is not a duplicate of the other PR because the other one is about adding HTAB again to Lib/wsgiref/headers.py, but this one is about Lib/wsgiref/handlers.py, not headers.py. This is something completely different because the other one only adds one character to be allowed, but this one reguards disallowing several characters in handlers.py (and not headers.py) which wasn‘t mentioned at all in the other PR because this concerns completely different aspects like status and some changes in debug mode in handlers.py and not headers.py. Please reopen. Thanks! |
|
@picnixz I guess, I know why this seemed to be a duplicate as I described the „differences“ to the other PR and then you thought that this was about some changes that I suggested for the other PR, but this is not what this PR is about because this is about something completely different in another location, so please reopen. Thanks! |
|
And I think that is would not be a very good idea if we combined these changes in another PR because the first one was already merged two weeks ago and the other one is only about adding allowed phrases again while this one is about the opposit (disallowing phrases) with the only thing in common that this one does not disallow one thing which will be added in the new PR again, but both other PRs concern another location which makes this not even a duplication in that way, so please reopen. Thanks! |
|
Thank you very much! |
vstinner
left a comment
There was a problem hiding this comment.
You should write tests for theses changes.
Lib/wsgiref/handlers.py
Outdated
| "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"] | ||
|
|
||
| _name_disallowed = re.compile(r'[\x00-\x1F\x7F]') | ||
| _value_disallowed = re.compile(r'[\x00-\x08\x0A-\x1F\x7F]') |
There was a problem hiding this comment.
I would prefer to wait until PR gh-144118 is merged, and then get these regexs from wsgiref.headers (line 4).
There was a problem hiding this comment.
Nice idea! One other idea that came to my mind when reading this is that we could theoretically use the same converter for wsgiref.headers and wsgiref.handlers (if we already use the same regexes here), but maybe something will be changed in one and then it's unclear that it's used there too, so probably a good idea to only use these twice because they should (probably) always be updated in the same way and it should be more secure to only have to update them once so that they aren't forgotten in the other place (like they were before I found them in this place), so that's a nice idea for future security, very good engineering!
Unforetunately, I can't add this before the other PR is merged because the tests won't work, so for now I have to let this be the same or theoretically I could also let this the same here and add a suggestion to the other PR so that it includes the regexes from this one because it seems to be urgent to me to merge this as it's a security update while the other PR only adds one character back (which is not that urgent) and then I could just add this as a suggestion to the other one.
What's your opinion on this @vstinner ? Thanks!
Co-authored-by: Victor Stinner <vstinner@python.org>
|
Hi @vstinner thank you very much for your review! I really like your suggestions and I'll make some commits in order to get them in the PR soon, thank you very much! |
Co-authored-by: Victor Stinner <vstinner@python.org>
|
(Added the "re" again because it's included in vstinner's regex and also in sethmlarson's PR so if this will be used in both cases then it's consistent in that way and it would take too much time to change this there as well and the suggestions by seth and vstinner were both with the "re", so I've added it) |
|
I'm not quite sure about the status whether it should be handled as a value or a name, so please correct me if I'm mistaken inthis case (or even none of both? but I don't think that) |
Tests are basically already included by seth's PR, but I can of course add similar ones to handlers as well. |
Please add new tests on handlers. |
|
@vstinner please re-review, thanks! (I'm still not quite sure about it because I haven't tested it (also the tests are not tested) or is this automatically done by the lint check?) |
|
(but even if this is done by the lint check automatically, I'm not quite sure about the handling of the status) |
|
and thanks for the review by the way 👍 |
|
Ahh, this seems actually to be checked automatically by the Tests based on what I've seen! Nice! But I'm not quite sure about the correct classification of the status nevertheless, so please have a look there, thanks! |
|
Ah, wrong way around, names is more strict, sorry |
Lib/wsgiref/handlers.py for security
|
I've now imported the regexes out of headers.py and hope that everything's fine now. 👍 There are two things I want to mention: It now seems to be the case that in debug mode the name and value are checked twice because they are already checked indirectly via headers.py, but I think that this is fine because an extra layer of security if this is changed in any way in future should be positive and I don't think (while it was not tested) that this costs that much of performance (and it should only be relevant in debug mode I guess). And one other thing: I'm not quite sure based up on the security reports whether statuses should be included when disallowing the c0 control characters, but I do think so because they're normally never a good thing and always likely to be used for injections, but I'm still not quite sure about another thing because like we've already said statuses also may include a message (which makes them even more risky for injections I guess, so this fix should be even more relevant I think) which could theoretically contain HTAB and I'm not quite sure about the handling in this case whether it should be handled as a name or a value. I've now set it to the more strict one because this should be safer to go, but it could theoretically cause problems if HTAB should be allowed, but I think that it's probably better to handle it this way and if there is really a important use case for it and it should really be allowed (and I'm mistaken with my idea that it's probably unnecessary and probably even risky to include it) then it would probably be reported via an issue like in seth's case, so I think that this is the best solution (and otherwise we won't even ever know whether it would have been better to disallow it and I think overall based up on the explainations before that it's better to disallow it). |
|
Now should be ready for review, please also write your opinion (if different to mine) to the things mentioned above, thanks! And thanks for helping out and adding some commits @vstinner and @sethmlarson! 👍 |
Misc/NEWS.d/next/Security/2026-01-31-21-56-54.gh-issue-144370.fp9m8t.rst
Outdated
Show resolved
Hide resolved
Co-authored-by: Victor Stinner <vstinner@python.org>
…fp9m8t.rst Co-authored-by: Victor Stinner <vstinner@python.org>
… also status injection
|
Unforetunately since the suggested change the lint seems to fail, I'll investigate on this |
|
Ah, seems like it's because of the ` where on one side is one and on the other are two |
|
It should only have one ` I guess (the other PRs also only include one always, so I think that this is correct) |
|
I've changed the news a little bit to "to prevent injections." because it's not only header, but also status injections and thank you very much for your review @vstinner and thanks for the suggestions, both are good! 👍 |
|
And please also have a look at the comment above concerning the handling of the status and the status message because of HTAB, thanks! @vstinner |
|
Edit: I've added @sethmlarson to ACKS |
Inserting newlines in a status is mostly useful to inject a HTTP header, no? |
| with self.subTest(c0): | ||
| base = BaseHandler() | ||
| headers = [('x','y')] | ||
| self.assertRaises(ValueError, base.start_response, f"{c0}", headers) |
There was a problem hiding this comment.
Please add tests on header keys and header names.
There was a problem hiding this comment.
Ignore my comment if you implement my second comment on only checking status in Lib/wsgiref/handlers.py :-)
| self.status = status | ||
| self.headers = self.headers_class(headers) | ||
| status = self._convert_string_type(status, "Status") | ||
| status = self._convert_string_type(status, "Status", name=True) |
There was a problem hiding this comment.
Would it be possible to only check the status using a regex here to leave _convert_string_type() unchanged?
| with self.subTest(c0): | ||
| base = BaseHandler() | ||
| headers = [('x','y')] | ||
| self.assertRaises(ValueError, base.start_response, f"{c0}", headers) |
There was a problem hiding this comment.
Ignore my comment if you implement my second comment on only checking status in Lib/wsgiref/handlers.py :-)
This PR is opened as public as the primary described CVE is already publicated and this only contains one certain point that has to be changed in code in Python and refers to issue #144370 (gh-144370).
In Lib/wsgiref/handlers.py at 260 (def _convert_string_type(self, value, title):) it is necessary to add the same fix as in #143916 requested as well (credits to @sethmlarson 👍). This references #143917 and #144118 and already includes the latest fix for the inclusion of HTAB, additions to this in form of tests will automatically be added by merging #144118.
There are some differences to wsgiref.headers.Headers that I made because I chose the default status of "name" to be "True" instead of false as "True" includes one more disallowed character for safety in future use cases if not defined otherwise, but I also added "name=True" even if not necessary in use cases just for safety for another case (if the default case was changed in order to produce similarity to wsgiref.headers.Headers while I'd recommend to change wsgiref.headers.Headers). I've also made an addition to the raised "ValueError" including "values" (without a further description of the very exact in- and exclusions like HTAB because this doesn't seem to be user-sided relevant to me) and I've used a slightly shorter internal form for the variables (which can be changed if wanted, but it's just style).
Important note: I was / am not quite sure about these changes as they were not tested, but I'm highly confident that this should be correct as it (mostly) bases on changes which were already made in other PRs (or at least in one other PR because this is the only merged one (but this also refers to a "non-merged" PR, so has to be concretized in that way)). And another important aspect is that I was not quite sure about the "status" (whether my classification is correct here as a "non-name value" when considering the handling of the characters, so please correct me if I'm mistaken (and I wasn't quite sure as well at the question whether potentially there should be added some different disallowed characters for this or some disallowed be removed, so please correct me if I'm mistaken)). Please also correct me if I'm mistaken in general anywhere in this PR (maybe this isn't even necessary for this case and I'm completely mistaken because this wasn't tested, but I'm pretty confident just from looking at the code).